A comparative analysis of HGSC and Celera human genome assemblies and gene sets
نویسندگان
چکیده
MOTIVATION Since the simultaneous publication of the human genome assembly by the International Human Genome Sequencing Consortium (HGSC) and Celera Genomics, several comparisons have been made of various aspects of these two assemblies. In this work, we set out to provide a more comprehensive comparative analysis of the two assemblies and their associated gene sets. RESULTS The local sequence content for both draft genome assemblies has been similar since the early releases, however it took a year for the quality of the Celera assembly to approach that of HGSC, suggesting an advantage of HGSC's hierarchical shotgun (HS) sequencing strategy over Celera's whole genome shotgun (WGS) approach. While similar numbers of ab initio predicted genes can be derived from both assemblies, Celera's Otto approach consistently generated larger, more varied gene sets than the Ensembl gene build system. The presence of a non-overlapping gene set has persisted with successive data releases from both groups. Since most of the unique genes from either genome assembly could be mapped back to the other assembly, we conclude that the gene set discrepancies do not reflect differences in local sequence content but rather in the assemblies and especially the different gene-prediction methodologies.
منابع مشابه
Comparative analysis of human genome assemblies reveals genome-level differences.
Previous comparative analysis has revealed a significant disparity between the predicted gene sets produced by the International Human Genome Sequencing Consortium (HGSC) and Celera Genomics. To determine whether the source of this discrepancy was due to underlying differences in the genomic sequences or different gene prediction methodologies, we analyzed both genome assemblies in parallel. Us...
متن کاملOn the sequencing and assembly of the human genome.
O June 26, 2000, Celera Genomics and the International Human Genome Sequencing Consortium (HGSC) announced at the White House the completion of the first assembly of the human genome and the completion of a rough draft, respectively. In February of 2001, the two teams simultaneously published their analyses of the genome sequences generated (1, 2). The joint announcement and subsequent publicat...
متن کاملMore on the sequencing of the human genome.
T he international Human Genome Project (HGP) and Celera Genomics published articles last year on the sequence of the human genome (1, 2). In a recent article (3), we analyzed aspects of the Celera article. We noted that the article did not report an assembly of Celera’s own data but rather reported only joint assemblies based on a data set that included the assembled genome sequence of the HGP...
متن کاملComparative bioinformatics analysis of a wild diploid Gossypium with two cultivated allotetraploid species
Background: Gossypium thurberi is a wild diploid species that has been used to improve cultivated allotetraploid cotton. G. thurberi belongs to D genome, which is an important wild bio-source for the cotton breeding and genetic research. To a certain degree, chloroplast DNA sequence information are a versatile tool for species identification and phylogenetic implications in plants. Different ch...
متن کاملWhole genome computational comparative genomics: A fruitful approach for ascertaining Alu insertion polymorphisms.
Alu elements are the most active and predominant type of short interspersed elements (SINEs) in the human genome. Recently inserted polymorphic (for presence/absence) Alu elements contribute to genome diversity among different human populations, and they are useful genetic markers for population genetic studies. The objective of this study is to identify polymorphic Alu insertions through an in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 19 13 شماره
صفحات -
تاریخ انتشار 2003